Genetic differentiation of Plasmodium vivax duffy binding protein in Ethiopia and comparison with other geographical isolates

Background Plasmodium vivax Duffy binding protein (PvDBP) is a merozoite surface protein located in the micronemes of P. vivax. The invasion of human reticulocytes by P. vivax merozoites depends on the parasite DBP binding domain engaging Duffy Antigen Receptor for Chemokine (DARC) on these red blood cells (RBCs). PvDBPII shows high genetic diversity which is a major challenge to its use in the development of a vaccine against vivax malaria. Methods A cross-sectional study was conducted from February 2021 to September 2022 in five study sites across Ethiopia. A total of 58 blood samples confirmed positive for P. vivax by polymerase chain reaction (PCR) were included in the study to determine PvDBPII genetic diversity. PvDBPII were amplified using primers designed from reference sequence of P. vivax Sal I strain. Assembling of sequences was done using Geneious Prime version 2023.2.1. Alignment and phylogenetic tree constructions using MEGA version 10.1.1. Nucleotide diversity and haplotype diversity were analysed using DnaSP version 6.12.03, and haplotype network was generated with PopART version 1.7. Results The mean age of the participants was 25 years, 5 (8.6%) participants were Duffy negatives. From the 58 PvDBPII sequences, seven haplotypes based on nucleotide differences at 8 positions were identified. Nucleotide diversity and haplotype diversity were 0.00267 ± 0.00023 and 0.731 ± 0.036, respectively. Among the five study sites, the highest numbers of haplotypes were identified in Arbaminch with six different haplotypes while only two haplotypes were identified in Gambella. The phylogenetic tree based on PvDBPII revealed that parasites of different study sites shared similar genetic clusters with few exceptions. Globally, a total of 39 haplotypes were identified from 223 PvDBPII sequences representing different geographical isolates obtained from NCBI archive. The nucleotide and haplotype diversity were 0.00373 and 0.845 ± 0.015, respectively. The haplotype prevalence ranged from 0.45% to 27.3%. Two haplotypes were shared among isolates from all geographical areas of the globe. Conclusions PvDBPII of the Ethiopian P. vivax isolates showed low nucleotide but high haplotype diversity, this pattern of genetic variability suggests that the population may have undergone a recent expansion. Among the Ethiopian P. vivax isolates, almost half of the sequences were identical to the Sal-I reference sequence. However, there were unique haplotypes observed in the Ethiopian isolates, which does not share with isolates from other geographical areas. There were two haplotypes that were common among populations across the globe. Categorizing population haplotype frequency can help to determine common haplotypes for designing an effective blood-stage vaccine which will have a significant role for the control and elimination of P. vivax. Supplementary Information The online version contains supplementary material available at 10.1186/s12936-024-04887-1.


Background
In 2022, there were an estimated of 6.9 million cases of Plasmodium vivax globally [1].Plasmodium vivax malaria is recognized as a cause of severe morbidity and mortality, with a significant negative impact on health in endemic countries [2][3][4].The unique biological features of P. vivax, such as hypnozoite formation and early gametocytogenesis, pose a challenge to global malaria elimination efforts as traditional control measures may not be as effective.A P. vivax vaccine would provide a powerful tool for malaria control [5].
Despite its African origins, the presence of P. vivax on the continent is unevenly distributed, with East Africa, particularly the horn of Africa experiencing endemicity and significant clinical disease [6].In Ethiopia, the impact of P. vivax on public health is substantial, given the country's elevated malaria burden.P. vivax constitutes a significant portion of malaria cases, and its transmission occurs seasonally in highland areas where the Anopheles mosquito, the primary vector, thrives [7].This distribution pattern is likely linked to the prevalence of Duffy positive individuals, as P. vivax is believed to rely on the Duffy receptor for invasion of reticulocytes and the manifestation of clinical disease.These findings underscore the significance of studying P. vivax in Ethiopia, emphasizing its prevalence and the need for further research in this context.
The invasion pathway of P. vivax into reticulocytes was previously believed to primarily depend on the interaction between the Duffy binding protein (PvDBP) and its receptor, the Duffy Antigen Receptor for Chemokines (DARC) [8][9][10].P. vivax Duffy binding protein (PvDBP) is a merozoite surface protein of 140 kDa located in the micronemes of P. vivax, and is used to invade RBCs [11].The erythrocyte-binding motif of DBP resides in the N-terminal conserved cysteine-rich domain (C1-C12) known as DBP region II [12][13][14].Region II of PvDBP1 (PvDBPII) is 330 amino acids in length and contains 12 cysteines that are conserved among different DBL domains.Since it has conserved cysteine-rich domain, PvDBPII is one of the most promising vaccine targets [15,16].The central stretch of P. vivax region II (cysteines at positions 4-7) is important for receptor recognition and critical for binding to DARCs on human RBCs [15,17].Genetic variation analysis indicated that PvDBPII shows high genetic diversity [18] and has a 4 times higher substitution rate compared to the rest of the gene [19,20].The region involved in RBC binding is highly polymorphic and there is evidence of strain specific protection [21].The current vaccine development is based on the DBPII haplotype of the P. vivax laboratory-adapted strain Sal-1 [11].So identifying the genetic diversity in PvDBPII and its dominant haplotypes is important for the rational design of DBP-based vaccine against vivax malaria.This study aims to assess the genetic diversity of PvDBPII across various study sites in Ethiopia and comparison with isolates from other regions of the world.

Study sites and sample collection
A health-facility based cross-sectional study was conducted from February 2021 to September 2022 at five hospitals: Arbaminch, Dubti, Gambella, Metehara, and Shewarobit hospitals (Fig. 1).Finger prick blood sample was collected from each malaria suspected febrile individuals for the purpose of doing malaria laboratory diagnosis using microscopy and Rapid Diagnostic Tests (RDTs).In order to maximize the accuracy of research participant inclusion, patients who were diagnosed positive for P. vivax by both RDT and microscopy were included in the study.Parasite densities were determined on thick blood film by malaria microscopists by considering a minimum of 200 WBCs, the reported number of parasites per microliter of blood, and an assumed total white blood cell count of 8000/μL.Venous blood samples were collected and spotted onto Whatman 903 Protein Saver Card, US (Dried Blood Spot, DBS).Genomic DNA (parasite and human DNA) was extracted from a 6 mm diameter spot of a DBS sample and eluted in 100 µl of TE Buffer (Tris-EDTA) using the QIAamp DNA Extraction Kit according to the manufacturer's instructions.To avoid contamination, the scissor blade (puncher) and forceps were immersed in the RNASE AWAY solution for 1 min and then cleaned with distilled water in between each sample.In addition to the microscopy and RDT, malaria parasites identification was confirmed by Bio-Rad CFX96 Real-Time PCR [22].
the Ethiopian P. vivax isolates, almost half of the sequences were identical to the Sal-I reference sequence.However, there were unique haplotypes observed in the Ethiopian isolates, which does not share with isolates from other geographical areas.There were two haplotypes that were common among populations across the globe.Categorizing population haplotype frequency can help to determine common haplotypes for designing an effective blood-stage vaccine which will have a significant role for the control and elimination of P. vivax.

Amplifications and sequencing of PvDBPII
P. vivax samples collected at five hospitals were evaluated for PvDBPII polymorphism.The amplification of PvDBP sequences utilized primers targeting the DBP1 region II, (forward: 5′-GAT ATT GAT CAT AAG AAA ACG ATC TCT AGT -3′; reverse: 5′-TGT CAC AAC TTC CTG AGT ATT TTT TTT AGC CTC-3′) [23], designed based on the reference sequence P. vivax PVX_110810 from of Sal I (Reference assembly: NC_009911.1).The amplification was carried out in a 20-μL reaction mixture containing 2 μL of genomic DNA, 10 μL of 2 × DreamTaq Green PCR Master Mix (Fermentas), and 0.5 μM primer.Amplification reactions were performed in a Bio-Rad T100 thermal cycler, with an initial denaturation at 94 °C for 2 min, followed by 35 cycles at 94 °C for 30 s, 58 °C for 30 s, and 72 °C for 90 s with a final 5-min extension at 72 °C.PCR products were purified prior to Sanger sequencing using Sephadex G-50 (Merck, Darmstadt, Germany) and purified amplicons were then used as templates for cycle sequencing reaction, using the Applied Biosystems BigDye Terminator v3.1 Cycle Sequencing Kit, according to manufacturers' protocols.Sequences were detected on the SeqStudio genetic analyzer (Applied Biosystems, California, USA) and the data were generated as Fasta format for analysis.In order to ensure quality, positive and negative controls were used in each PCR run, and gel electrophoresis was performed to visualize the amplified product.

Genetic diversity and statistical analysis
Sequences of PvDBPII genes were trimmed and assembled using De Novo assembler of Geneious Prime version 2023.2.1 [24].Pairwise and multiple aliment was done using ClustalW alignment algorithm in Diversity Molecular evolutionary Genetic Analysis (MEGA X).Missing values across the column of the total sequences were removed using Jalview version 2.11.2 (a multiple sequence alignment editor and analysis workbench) [25].All ambiguous positions were removed for each sequence pair (pairwise deletion option).Nucleotide substitution (r) pattern and rates were estimated under the Tamura-Nei model, and phylogenetic tree was built to see evolutionary divergence between Sequences using MEGA version 10.1.1 software [26].Nucleotide diversity (π), number of segregating sites (S), the number of haplotypes (H), haplotype diversity (Hd), the total number of Fig. 1 A map of the study sites, Ethiopia mutations (η), and the average number of nucleotide differences (k) were analysed using DnaSP version 6.12.03 software [27].Frequencies were analysed using Statistical Package for Social Sciences (SPSS, version 25).P-values less than 0.05 were considered as statistically significant.

Nucleotides and amino acid frequencies of PvDBPII gene sequences
Analysis was done using the consensus sequence of 736 bps which corresponds to nucleotide positions from 838 to 1573 of the Salvador I reference.The nucleotide frequencies of the 58 sequences were 40.36% (A), 26.06% (T), 11.66% (C), and 21.92% (G).The mean percent of GC in assembled sequence was 33.5%.The amino acid with highest frequency was leucine (L) (9.11) followed by alanine (A) 97.69%, whereas the frequency of tryptophan (W) (1.43%) was the least (Table 2).

Single nucleotide polymorphism (SNP) of PvDBPII among the Ethiopian isolates
The number of segregating sites were 8, with a nucleotide diversity of 0.00267 ± 0.00023 and an average of 1.791 pairwise nucleotide differences (k).The nucleotide diversity per study site was 0.00295, 0.00177, 0.00216, 0.00363, and 0.0022 among the isolates from Metehara, Gambella, Dubti, ArbaMinch, and ShewaRobit, respectively (Table 3).A total of 8 mutations were observed, consisting of 4 singleton mutations (present in only one sequence) at positions 12, 43, 149, and 164, and 4 parsimony informative sites (with at least two types of nucleotides or occurring with a minimum frequency of two) at positions 274, 316, 414, and 433.From the 58 PvDBPII sequences, 7 haplotypes were identified (GenBank accession numbers: PP141174-PP141180), yielding a haplotype diversity of 0.731 ± 0.036.Among the 58 P. vivax isolates from Ethiopia, 41.38% (24 sequences) matched the Sal-1 reference sequence, with the majority originating from Gambella (37.5% of the 24).All identified haplotypes exhibited dimorphism, except for the predominant haplotype 1, which accounted for 41.38% (24 out of 58) of the Ethiopian isolates (Fig. 2).Of the total sequences, the Duffy status of five isolates were negative, and of these most of them (80% of the 5) were identified as haplotype 1 and the other one was haplotype 4 (see Additional file 1).

Diversity and evolutionary history among isolates from different study sites
Within the five study sites, Arbaminch exhibited the highest diversity, housing 6 distinct haplotypes, while Gambella had only 2. Haplotype 2 was exclusive to Metehara, while Arbaminch contained haplotypes 5, 6, and 7. Notably, three haplotypes (5.17%) were represented by a single sequence.The nucleotide diversity of PvDBPII in Arbaminch (π = 0.00363) surpassed that of the other sites (Table 3).The Neighbour-Joining method was employed to infer the evolutionary history, with evolutionary distances calculated using the Tajima-Nei method, measured in  the number of base substitutions per site.The phylogenetic tree, based on PvDBPII SNPs, indicated a shared profile among parasites from different study sites, with a few exceptions.Notably, the majority of isolates from Gambella (9 out of 12, or 75%) exhibited a close relationship with the Sal-1 reference strain, while isolates from Arbaminch showed the least relationship (3 out of 11, or 27%) (Fig. 3).

Genetic differentiation and haplotypes network among global isolates
A total of 39 haplotypes were identified from 223 PvDB-PII gene sequences, consisting of 58 sequences from Ethiopia and 165 obtained from the NCBI archive (global isolates).The average number of nucleotide differences (k) was 2.411, with 40 segregating sites, a nucleotide diversity of 0.00373, and a haplotype diversity of 0.845 ± 0.015.The genetic differentiation between Ethiopian isolates and those from Iran was minimal, as indicated by an F ST value of 0.00.However, there was moderate differentiation between Ethiopian isolates and those from Sudan, with an FST value of 0.0697.Ethiopian isolates were most differentiated from PNG isolates (0.1846).Notably, isolates from Papua New Guinea exhibited significant genetic differentiation from those in Sudan and Mexico, with FST values of 0.3051 and 0.3336, respectively (Table 4).
In the network analysis, the haplotype prevalence ranged from 0.45% to 27.3%.Both haplotypes H1 (Sal-1 like) and H4 were present across isolates from all geographical areas of the globe.Haplotypes H5 and H9 were shared among all geographic locations except those from Iran and Brazil, respectively.The most prevalent haplotype H1 (27.3%) was shared with multiple populations followed by H4 (23.8%).Twenty-six haplotypes (66.67%) were represented by a single sequence.Three haplotypes (H10, H11, and H12) were exclusive to Ethiopian isolates, not shared with those from other geographical areas.The network comprised a total of 48 connections among haplotypes, with 36 (75%) differing by one nucleotide, and 12 (25%) varying by two nucleotides (Fig. 4).

Discussion
The analysis of PvDBPII gene sequences from diverse geographical regions sheds light on several noteworthy patterns of parasite evolution and implications, such as evaluation of malaria vaccine candidates [34].The range of observed haplotype prevalence, spanning from 0.45% to 27.3%, serves as an indicator of the inherent genetic diversity within P. vivax populations.This observation is particularly relevant considering that the ongoing vaccine development is solely grounded in the DBPII haplotype of the laboratory-adapted P. vivax strain Sal-1 [11].This study was the first to report polymorphism on the isolates from five different malaria endemic sites in the country.PvDBPII polymorphisms were investigated in 58 P. vivax isolates from endemic study sites to expand the knowledge on PvDBPII gene population diversity in Ethiopia and compare the data with those from other endemic countries in the world.In general, this study identified 7 haplotypes from Ethiopian isolates and 39 haplotypes from the global isolates.
In this study, seven haplotypes were identified from a pool of 58 PvDBPII sequences.This represents a relatively low number of haplotypes compared to a prior study in Ethiopia where nine distinct haplotypes were discerned from 23 isolates [38].Although the phylogenetic tree based on PvDBPII SNPs indicated shared profiles among parasites from different study sites, with few exceptions, the haplotype diversity varied across sites, ranging from 0.8545 in Arbaminch to 0.4091 in Gambella.This may be due to the difference in transmission intensity of P. vivax at different part of the country [35].Among the 58 Ethiopian sequences, the highest number which was 24 (41.4%)sequences were identical to Sal-1 reference sequence and shared with multiple populations.The identification of haplotypes identical to the Sal-1 reference sequence, and those shared across populations, may hold significance for vaccine development [37].The DBPII Sal-1 variant is currently utilized in the development of a PvDBP-based vaccine, making these haplotypes potentially attractive candidates for further vaccine research and development [40].
For development of PvDBP-based vaccine it is important to assess the levels of DBPII genetic diversity among P. vivax field isolates globally [11].Comparing the genetic differentiation among isolates from different regions provides insights into the evolutionary dynamics of P. vivax.
The negligible genetic differentiation between Ethiopian isolates and those from Iran suggests a degree of gene flow, shared ancestry or stronger functional constraints on PvDBPII [35].In contrast, moderate differentiation with isolates from Sudan indicates regional distinctions in parasite populations.Notably, isolates from Papua New Guinea display substantial genetic differentiation from those in Sudan and Mexico, suggesting distinct evolutionary trajectories in these regions and or the geographic barrier between the countries inhibiting gene flow [38].
The haplotype network configuration of PvDBPII provided the information about intragenic recombination in these gene fragments.Network analysis further elucidates the relationships among haplotypes, revealing both single and multiple nucleotide variations.A haplotype network was constructed to establish the relationships among the PvDBPII haplotypes from the 223 isolates of six malaria endemic countries from different continents in the world.A total of 39 haplotypes were identified with haplotype prevalence ranging from 0.45% to 27.3%.Such a finding advanced the knowledge on the parasite population dynamics in this country for the rational design of effective interventions to block disease transmission [37].There were three haplotypes only with the Ethiopian isolates, which does not shared with isolates of different geographical areas.Existing of such unique geographical sequences, may be due to biogeographic limitations, or may be due to the possible recent introduction of the parasite to that geographic area [34].

Conclusion
This study highlights the intricate genetic landscape of PvDBPII in P. vivax populations in Ethiopia and globally.The prevalence and distribution of haplotypes, regional variations, and genetic differentiation underscore the complex interplay of factors shaping the evolutionary dynamics of this parasite.Understanding these dynamics is crucial for tailoring effective malaria control strategies and advancing the knowledge on P. vivax biology.Such insights enable investigators to select dominant haplotypes crucial for designing an effective blood-stage vaccine thereby playing a pivotal role in the control and elimination of P. vivax.

Fig. 2
Fig. 2 Single nucleotides polymorphism of PvDBPII among Ethiopian isolates.A Position of nucleotide changes.Polymorphic nucleotides are listed for each haplotype.Nucleotides identical to those of the reference sequence, Sal-I (NC_009911.1), are marked in green.The dimorphic nucleotide changes are marked in red color.Y-axis represent haplotypes list in the left while the total number of sequences for each haplotype at right panel (D: + ve; Duffy positive, and D:-ve; Duffy negative).X-axis represent nucleotides position.B Frequencies (%) of nucleotide changes found in PvDBPII among 58 Ethiopian isolates.Frequencies in percent denoted on the Y-axis, and nucleotide changes on the X-axis

Fig. 4
Fig. 4 Networking of PvDBPII obtaining from Ethiopian P. vivax isolates and isolates from different malaria endemic countries; Brazil, Iran, Mexico, Papua New Guinea, and Sudan

Table 1
Demographic information of study participants

Table 3
Nucleotide and haplotype diversity of PvDBPII gene and number of isolates per haplotypes among different Ethiopian populations

Table 4
Nucleotide and haplotype diversity of PvDBPII gene and Fst of global populationsPopulations from Brazil (BR), Ethiopia (ET), Iran (IR), Mexico (ME),Papua New Guinea (PNG) and Sudan (SU).n; number of isolates, H; Number of haplotypes, Hd; Haplotype diversity; π; nucleotide diversity, F ST ; Fixation index, a measure of genetic differentiation between populations